
Manuscript: Randomization Inference with Rainfall Data: Using Historical Weather Patterns for Variance Estimation
Author: Alicia Cooperman
Date: April 18, 2017
Summary: This folder contains the data and R script necessary to replicate all analysis and figures in the paper 'Randomization Inference with Rainfall Data: Using Historical Weather Patterns for Variance Estimation' and its Supplementary Materials. This file describes the contents of the folder.

The contents of the replication folder are the following:

1. cooperman_dataset.Rdata: The data. See below for details.

2. cooperman_ri.R: The script that conducts the randomization inference procedure to calculate the sampling distribution of ATEs under each clustering assumption and rainfall measure. It also calculates p-values for the RI procedure using different numbers of simulations for the Appendix. The script is time-consuming and creates large files. Outputs are already included in the folder.

3. cooperman_main.R: The script that creates all Tables and Figures in the main text.

4. cooperman_appendix.R: The script that creates all Tables and Figures in the supplementary materials.

5. ate_sims.Rdata: List of vectors of simulated ATEs using randomization inference code. Output of cooperman_ri.R. See below for description and naming conventions.

6. pvals_by_nsims.Rdata: Output of cooperman_ri.R. See below for description.


Note on Computation Time:
A full replication will take a week or longer on a standard laptop. The datasets for creating the 1000 potential weather assignments for each specification are also very large, up to 1.5GB for some of the Index datasets. The vectors of average treatment effects for each specification are included as a separate .Rdata file to facilitate replication. The code is annotated to explain which steps can be skipped as needed. 

____________________________________________________________

## Dataset ##

cooperman_dataset.Rdata - The data. Dataset contains replication materials from Gomez et al. (2007) accessed at http://myweb.fsu.edu/bgomez/research.html on May 18, 2015. 

The following variables were calculated by the author, with details in the Supplementary Materials: Rainfall12, Rainfall12.index, Rainfall12.spi, Rainfall100. 

Variables with "12" in the name were calculated with a Kriging procedure with 12 nearest data points, and the variable with "100" in the name was calculated with a Kriging procedure with 100 nearest data points. All analysis was done with the 12 nearest data points, but the variable with 100 nearest data points is described in the summary statistics table. See Supplementary Materials Appendix for details on data processing.

The following variables are from the Gomez et al. (2007) replication materials: Rain, Snow, Turnout, PcntBlack, ZPcntHSGrad, FarmsPerCap, Unemploy, AdjIncome, Closing, Property, Literacy, PollTax, Motor, GubElection, SenElection, Turnout.Lag. These variables are used as covariates in the analysis. See the Gomez et al. (2007) replication materials for details. 

The following variables were added by the author and refer to geographic areas: State, CWA_short, REG. State is the state, CWA_short refers to the County Warning Area, and REG refers to the weather region. CWA and Region are National Weather Service designations. Where the CWA designation for a county was 6 or 9 letters, the first 3 letters were used. This occurred when a county fell within multiple CWAs. Contact author for details on data cleaning procedure.

____________________________________________________________

## Data Output from cooperman_ri.R ##

1. ate_sims.Rdata - a data file that is a list with vectors of 1000 average treatment effects estimated using the simulated weather assignments. 

- Vectors with "indep" in the name refer to counties being independent of one another. Files with "cwa", "state", "reg", and "US" refer to independence at the County Warning Area, State, Region, and US, respectively. 
- Vectors with "index" in the name refer to the rainfall index measure. Files with "spi" in the name refer to the rainfall SPI measure. Vectors without either of these in the name refer to the rainfall inches measure.
- Vectors with "pre2001" in the name refer to the robustness check in the Supplementary Materials - Appendix, where weather data after 2000 were excluded from potential weather assignments.
- Vectors with "FE" in the name refer to the robustness check in the Supplementary Materials - Appendix, which uses a fixed effects model.

2. pvals_by_nsims.Rdata - a data file for replicating the results in Appendix Table 7 and Figure 1. It is a matrix of p-values based on estimated ATEs for the Rainfall (inches) and US-Year specifications using different numbers of simulations. Each column refers to the number of simulations used to calculate the p-value. There are 50 rows.

____________________________________________________________

## References ##

Gomez, Brad T., Thomas G. Hansford, and George A. Krause.  2007.  "The Republicans Should Pray for Rain: Weather, Turnout, and Voting in U.S. Presidential Elections."  Journal of Politics  69 (August): 649-663.
